Bayesian inference as a cross-linguistic word segmentation strategy: Always learning useful things
نویسنده
چکیده
Statistical learning has been proposed as one of the earliest strategies infants could use to segment words out of their native language because it does not rely on language-specific cues that must be derived from existing knowledge of the words in the language. Statistical word segmentation strategies using Bayesian inference have been shown to be quite successful for English (Goldwater et al. 2009), even when cognitively inspired processing constraints are integrated into the inference process (Pearl et al. 2011, Phillips & Pearl 2012). Here we test this kind of strategy on child-directed speech from seven languages to evaluate its effectiveness cross-linguistically, with the idea that a viable strategy should succeed in each case. We demonstrate that Bayesian inference is indeed a viable cross-linguistic strategy, provided the goal is to identify useful units of the language, which can range from sub-word morphology to whole words to meaningful word combinations.
منابع مشابه
Bayesian inference as a viable cross-linguistic word segmentation strategy: It's all about what's useful
Identifying useful items from fluent speech is one of the first tasks children must accomplish during language acquisition. Typically, this task is described as word segmentation, with the idea that words are the basic useful unit that scaffolds future acquisition processes. However, it may be that other useful items are available and easy to segment from fluent speech, such as sub-word morphol...
متن کاملEvaluating language acquisition models: A utility-based look at Bayesian segmentation
Computational models of language acquisition often face evaluation issues associated with unsupervised machine learning approaches. These acquisition models are typically meant to capture how children solve language acquisition tasks without relying on explicit feedback, making them similar to other unsupervised learning models. Evaluation issues include uncertainty about the exact form of the ...
متن کاملLearning Words and Their Meanings from Unsegmented Child-directed Speech
Most work on language acquisition treats word segmentation—the identification of linguistic segments from continuous speech— and word learning—the mapping of those segments to meanings—as separate problems. These two abilities develop in parallel, however, raising the question of whether they might interact. To explore the question, we present a new Bayesian segmentation model that incorporates...
متن کاملImproving nonparameteric Bayesian inference: experiments on unsupervised word segmentation with adaptor grammars
One of the reasons nonparametric Bayesian inference is attracting attention in computational linguistics is because it provides a principled way of learning the units of generalization together with their probabilities. Adaptor grammars are a framework for defining a variety of hierarchical nonparametric Bayesian models. This paper investigates some of the choices that arise in formulating adap...
متن کاملUtility-based evaluation metrics for models of language acquisition: A look at speech segmentation
Models of language acquisition are typically evaluated against a “gold standard” meant to represent adult linguistic knowledge, such as orthographic words for the task of speech segmentation. Yet adult knowledge is rarely the target knowledge for the stage of acquisition being modeled, making the gold standard an imperfect evaluation metric. To supplement the gold standard evaluation metric, we...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014